An Efficient Comparison Technique for Sanitized XML Trees

نویسندگان

  • Mohammad Ashiqur Rahaman
  • Yves Roudier
چکیده

When comparing different versions of large tree structured data the detection of changes and according generation of the minimum cost edit script is a CPU and disc I/O intensive task. State of the art requires the complete XML trees to be in memory and intermediate normalized trees to be computed before any comparison may start. Furthermore, the comparison of sanitized XML trees is not addressed in these techniques. In this paper, we propose a comparison technique for sanitized XML documents which ultimately results into a minimum cost edit script that transforms an initial version of XML tree to a target tree. This method makes use of encrypted integer labels which encode the original XML structure and content. The content of the sanitized XML is readable only by a legitimate party. Based on this encoding, any third party can compare the tree nodes on the fly without relying on any intermediate normalized trees. Besides, it allows partial comparison as opposed to computing the full trees a-priori of starting any matching operation. To support our approach a modular algorithm describing the comparison technique is provided along with its complexity analysis.

منابع مشابه

Nearest Neighbour Search For XML Trees

There is a significant research effort on efficient computing of similarities between objects of non traditional data types as strings, documents, sound tracks or pictures. It is reasonable to use the results of these efforts in the problem of XML tree matching, too. As an XML document has a tree structure and the trees can be transformed into a linear structure, a tree can be regarded as a spe...

متن کامل

Mining XML Frequent Query Patterns

With XML being the standard for data encoding and exchange over Internet, how to find the interesting XML query characteristic efficiently becomes a critical issue. Mining frequent query pattern is a technique to discover the most frequently occurring query pattern trees from a large collection of XML queries. In this paper, we describe an efficient mining algorithm to discover the frequent que...

متن کامل

Treeguide Index: Enabling Efficient XML Query Processing

XML DBMSs require new indexing techniques to efficiently process structural search and full-text search as integrated in XQuery. Much research has been done for indexing XML documents. In this paper, we first survey some of them and suggest a classification scheme. It appears that most techniques are indexing on paths in XML documents and maintain a separated index on values. In some cases, the...

متن کامل

Efficient XML Structural Similarity Detection using Sub-tree Commonalities

Developing efficient techniques for comparing XML-based documents becomes essential in the database and information retrieval communities. Various algorithms for comparing hierarchically structured data, e.g. XML documents, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being modeled as ordered label...

متن کامل

Efficient Structural Joins on Indexed XML Documents

Queries on XML documents typically combine selections on element contents, and, via path expressions, the structural relationships between tagged elements. Structural joins are used to find all pairs of elements satisfying the primitive structural relationships specified in the query, namely, parent–child and ancestor– descendant relationships. Efficient support for structural joins is thus the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009